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abstract 


^ Tn  fhis  oaoer  we  discuss  techniques  of  exploiting  the  obvious  relationship  between  program 
In  paper  ^e  i generation.  We  develop  methods  of  program 

r fir/fior^that  are  derived  from^  a re^esentation  of  recursive  data  structures  in  the 
Logic  for  Computable  Functions  (LCF).  A?  a step  towards  a formal  problem  specification 
laneuage  we  define  definitional  extensions  of  LCF.  These  include  a calculus  for 
comnuuble)  homogeneous  sets  and  restricted  quantification.  Concepts  that  are  obtained  by 
Uomputaoie;  no  g al„ebras  are  used  to  derive  function  definition  schemes  from  an  LCF 

"rST/sX  for  ,h,  s.^plto.lon  of 
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1.  Introduction 

In  this  paper  we  are  concerned  with  the  use  of  data  structures  in  generating  correct  programs  from 
formal  problem  statements. 

Present  experimental  systems  for  automatic  program  synthesis  (see  [BuL],  [MaW]  for  recent  work) 
are  based  on  a rather  large  amount  of  knowledge  in  the  form  of  individual  axioms  and  problem 
solving  methods.  At  each  step  in  the  synthesis  process  the  system  has  to  search  for  an  applicable 
piece  of  knowledge  in  the  data  base.  One  of  the  main  problems  is  the  automatic  construction  of 
iterative  loops  or  recursive  calls.  However,  it  can  be  observed  that  the  structure  of  the  data  is 
reflected  more  or  less  in  the  structure  of  any  program  operating  on  them,  both  in  the  analysis  of 
subcases  and  (iterative  or  recursive)  loops.  In  fact,  if  a recursion  or  iteration  is  possible  (and 
reasonable)  at  all  it  is  because  of  a corresponding  data  structure.  So  it  is  safe  to  say  that  the 
generation  of  a program  is  always  guided  by  an  underlying  domain  structure.  7 hus,  by 
"strengthening"  the  guide  lines  we  can  avoid  the  system  having  to  "retrieve”  anew  the  underlying 
structure  each  time  it  is  synthesizing  a program.  Organizing  the  knowledge  abvout  the  data  domain 
and  representing  it  in  such  a way  that  it  directly  assists  a system  in  constructing  a program  can 
possibly  eliminate  some  complicated  problem  solving  processes. 

In  the  case  of  recursive  data  types  the  relationship  between  program  structure  and  data  structure  is 
particularly  obvious.  For  this  kind  of  data  types  the  Logic  for  Computable  Functions  (LCF)  [Mil, 
Mi2,  WM]  provides  a natural  basis  for  reasoning  about  program  generation,  since  both  the  problem 
and  the  prospective  structure  can  be  expressed  in  the  same  formal  system.  Obviously,  the  crucial 
point  is  to  find  an  appropriate  tep'-esentation  of  the  data  structure.  A large  portion  of  this  paper  is 
devoted  to  this  problem;  it  attempts  to  develop  a sufficient  mathematical  framework  for  dealing  with 
abstract  data  types  within  LCF.  Based  on  this  theory  methods  of  function  specification  are 
investigated  that  are  directly  derivable  from  the  data  structure  representation  and  do  not  require 
general  problem  solving  methods  They  include  extensions  of  the  term  language  of  LCF,  in 
particular  a calculus  for  (a  restricted  kind  of)  sets  and  restricted  quantification,  and  certain 
"definition  schemes";  both  kinds  are  based  on  concepts  obtained  by  interpreting  data  types  as 
algebras. 

The  definition  techniques  are  meant  to  be  a step  towards  a "problem  specification  language"  that 
allows  easy  and  concise  definition  of  furxtions  on  a level  of  abstraction  that  is  close  to  the  intuitive 
conception  of  the  user.  This  approach^o  program  specification  bears  a resemblance  with  what  has 
been  called  "very  high  level"  or  "nc.iprocedural"  programming  languages.  Indeed,  programming- 
language  features  similar  to  some  of  |;he  constructions  to  be  discussed  here  have  been  proposed 
elsewhere  (e  g.  [Ea]'  and  are  available  in  SETL.  However,  we  are  not  dealing  with  a programming- 
language,  but  a formal  system  that  permits  formal  reasoning.  Emphasis  is  given  to  interpreting  the 
added  constructions  in  terms  of  LCF  in  order  to  make  feasible  meaning  preserving-  transformations 
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of  expresssions.  Only  the  fact  that  every  LCF  term  also  has  an  interpretation  as  a computation  rule 
for  the  function  denoted  by  it.  allows  us  to  regard  it  as  a kind  of  program. 

The  following  section  provides  the  logical  and  mathematical  framework  as  needed  in  the  subsequent 
sections.  It  gives  a short  overview  of  the  type  free  version  of  LCF  and  the  mathematical  theory  ot 
subdomains.  Section  3 discusses  the  axiomatization  of  abstract  data  types,  their  representation  in 
LCF,  and  the  interpretation  of  types  as  heterogeneous  algebras.  Section  4 is  devoted  to  introducing 
elements  of  a specification  language,  which  include  (computable)  sets,  set  operations  and  bounded 
quantification.  The  algebraic  concepts  of  section  3 lead  to  methods  for  defining  and  simplifying 
functions  over  data  types.  In  section  5.  the  definition  methods  are  demonstrated  in  an  example  tha: 
is  based  on  the  data  types  of  LCF  terms  and  is  taken  from  a LCF  implementation.  Finally,  possible 
directions  of  future  work  are  indicated  in  the  concluding  section. 

The  paper  is  intended  to  be  essentially  self-contained.  The  letters  "T.P."  that  can  often  be  found 
instead  of  a proof  are  meant  to  ind  cate  that  a prove  has  been  generated  by  means  of  the  interactive 
theorem  prover  for  LCF.  The  amount  of  user  interaction  required  to  generate  a proof  is  not 
indicated;  in  general,  the  proofs  for  simple  lemmas  can  be  generated  fully  automatically.  The 
automatic  theorem  prover  component  of  the  system  employed  for  proof  generation  will  be  described 
; J j * In  detail  in  a forthcoming  paper  [He]. 

: ^ ! 
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3 , > 2.  The  type-free  Logic  for  Computable  Functions 

1 The  Logic  for  Computable  Functions  (LCF)  was  invented  by  D.  Scott  (unpublished)  and,  in  a 

' modified  form,  mechanized  by  R.  Milner  [Mil.  Mi2].  Using  this  interactive  proving  system  the  logic  ^ 

’ has  subsequently  been  applied  to  various  problems  in  the  Mathematical  Theory  of  Computation: 

\ schematology.  formalization  of  syntax  and  semantics  of  programming  languages,  proving  properties 

of  programs  and  the  correctness  of  interpreters  and  compilers  (cf.  [AAW,  N2]  for  more  recent  work 
on  PASCAL  and  LISP  and  comprehensive  references).  In  these  experiments  LCF  proved  very 
^ useful  for  formalizing  and  proving  problems  involving  (possibly  partial)  recursive  functions. 

In  the  following  the  reader  is  assumed  to  be  familiar  at  least  with  the  basics  of  LCF.  For  the  sake  of 

j self-containment  a syntax  of  the  language  is  given  in  appendix  A. 1.1. 

2.1  Type-free  LCF 

In  this  subsection  the  type  free  version  of  LCF  (or  tfLCF  for  short)  is  described  briefly  as  needed 
for  the  further  development.  This  version  of  the  logic  was  developed  by  D.  Scott,  R.  Milner  and 
R.  Weyhrauch  [unpublished  notes].  Most  of  the  material  and  the  ideas  presented  here  is  essentially 
' due  to  them;  part  of  it  can  also  be  found  in  [Sc2]. 

‘ 'l  Essentially,  tfLCF  axiomatizes  one  of  Scott’s  models  for  the  X-calculus  [Scl];  the  domain  1 which  is 

constructed  over  the  4-element  lattice  T of  truth  values: 

T 

/ \ 

tt  tf 

\ / 

, X 

The  main  characteristic  of  the  domain  I is  that  it  is  isomorphic  to  its  domain  of  continuous  ^ 

functions;  thus,  each  element  of  I can  also  be  regarded  as  a function  from  1 to  I. 

The  language  of  the  logic  itself  is  essentially  the  same  as  for  the  typed  version  (see  appendix  A.i.l), 

I with  two  exceptions: 

(a)  the  restrictions  for  building  expressions  that  result  from  the  types  are  abolished; 

(b)  besides  the  4 truth  values,  the  language  includes  constants  I for  the  "universe",  i.e.  the  domain 

j of  the  model,  and  T for  the  domain  of  truth  values. 

The  main  problem  in  extending  the  semantics  of  expressions  to  the  type  free  case  is  -efining  the  | 

meaning  of  the  conditional  p » <\,r  for  any  term  p.  This  is  done  by  mapping  the  elements  of  I onto 
the  truth  values  (this  will  be  made  cltarer  in  the  following  subsection).  The  meaning  of  T a x,y  is 

not  further  specified  except  that  T » x,x  ■ x.  However,  it  turns  out  that  it  can  be  taken  as  the  join  ; 

1 - of  X and  y (see  below). 


r 


For  details  about  axioms  and  inference  rules  of  the  logic,  see  appendix  A.  1.2. 

The  element  x is  called  undtfintd.  and  the  element  t is  called  overdefintd;  all  other  elements  are 
called  defined.  A predicate  b can  be  defined  in  1 such  that 

d(x)  » tt  iff  ii  defined,"  d(x)  • x,  i(T)  • T 

,c  d yields  the  distinction  between  defined  and  non-defined  elements  in  1.  d is  definable  in  the  logic 
by'a  mapping  onto  the  3-element  lattice  {x,  tt,  t}.  The  definition  depends  on  the  fact  that  the  truth 
values  are  isolated  elements  in  the  lattice  1.  For  details  see  appendix  A.2.1. 

A function  f is  called  strict  if  it  returns  x or  T whenever  the  argument  is  x or  T resp.,  that  is  if  the 

following  wmstru.  of.:  ,,,  „ . j„w, 

. 1.  called  x.«rict  If  ((x).x  . and  T.ftckt  If  «T>.T  . f 1,  called  total  It  It  never  returns  x or  T jj 

for  a defined  argument,  i.e.,  if 

d(x)  •>  bUM)  • tt 

holds.  Thus,  if  a function  f is  strict  and  total  then  d(x)  » i«(x)) . 

Any  function  f can  be  made  into  a strict  one  by  first  applying  b to  the  argument:  For 
f :■  [\x.  i(x)  3 f(x),  X] 
we  obviously  have 


T if  XiT 

f'(x)  1 X if  XIX 

f(x)  otherwise 

In  the  next  section  a functional  itr  will  be  defined  that  turns  ary  function  into  a strict  one. 

In  the  following  we  oefine  some  standard  operators  on  1 that  will  be  used  throughout  the  paper, 
o [Xxyz.x(y(z))]  function  composition 


pair  ;•  [X  x y z.  z » x,y] 


ordered  pair 


n,  :■  [X  X.  x(tt)] 
nj  :«[Xx.  x(ff)] 


projection  onto  first  component 
projection  onto  second  component 


'I 

1 


K I.  [X  X y z.  peir(x(z  tt),  y(z  If))]  cartesian  product 
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♦ :■  [X  • b X.  iii(x)  3 pairftt,  afitj  x)),  pair(ff 

Id  :a  [Xx.  x] 

u :a  [Xx  y.  T 3 x,y] 

A Ji  [Xx  y.  (d(x)  u d{y))  = x 3 y,  ff] 

V :a  [Xx  y.  (d(x)  u d(y))  3 x 3 ft,  yj 

••  !■  [Xx.  X 3 

For  pair(x,y)  we  also  use  the  notation  <x,y>  . 


b(n2  x))]  disjoint  union 

identity  function  ^ 

Join 

I 

logical  and 

logical  or  * 

negation  '' 


The  standard  properties  of  these  functions  are  easily  derivable;  for  example, 

Vx  y.  n,«x,y»  a X Vx  y.  n2«x,y»  a y 

Milner  and  Weyhrauch  have  shown  that  u has  all  the  properties  of  the  Join  operation  in  a lattice 
also  with  respect  to  the  partial  order  defined  by  c.  In  particular.  ttuffiT  and  xutit  for  all  x(I. 

A strict  conditional  , j.e.  t :3x,y  . t for  all  x and  y . is  definable  in  terms  of  the  normal 
conditional  3 ; 


:3  :a  Xi  x y.  2 3 (j  3 x,  T),  (2  3 t,  y) 

Since  the  normal  conditional  will  not  be  used  in  this  paper  except  in  the  Join  operation,  we  will 
henceforth  use  the  character  3 to  denote  the  strict  conditional. 

The  propositional  connectives  are  strict  in  all  arguments;  they  extend  the  standard  functions  (for 

two-valued  logic)  to  four  truth  values  in  such  a way  that  the  standard  relationshios  like 
X V y ■ - (-X  A -y)  still  hold.  ^ 


2.2  Retracts,  Domains,  Types 

The  typefree  logic  essentially  axiomatizes  the  "universal"  domain  I.  However,  one  would  like  to  talk 
also  about  domains  other  than  I,  like  "lists"  or  "integers."  It  turns  out  that  they  can  be  "embedded" 
into  the  universal  domain;  there  are  subdomains  of  I that  correspond  to  those  particular  domains  in 
a sense  to  be  made  precise  in  the  following  section.  As  Scott  [Sc2]  has  shown.  I is  so  rich  in 
subdomains  that  one  can  find  a corresponding  subdomain  for  all  those  domains  or  "data  types" 
computer  scientists  are  normally  interested  in. 

The  standard  way  of  defining  a subdomain  is  by  using  retracts.  A retract  is  an  idempotent  function 
i.e.,  an  f < I with  f o f . f . The  idempotency  property  implies  that  all  elements  in  the  range  of  a 
retract  f remain  unchanged.  i.e.  the  range  of  f is  exactly  its  set  of  fixed  points,  subdomain  of  I 
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(tt  e range  of  f).  This  domain  D,  can  be  shown  to  be  a complete  lattice.  In  the  remainder  of  this 
paper,  the  term  “domain"  always  means  "subdomain  of  1 as  defined  by  a retract."  Very  often  the 
domain  and  the  function  (the  retract)  defining  it  will  be  confused  by  using  the  same  notation  for 
both;  however,  from  the  context  it  will  be  clear  what  exactly  is  meant.  For  emphasis,  we  will  say 
retraction  if  we  mean  the  function  in  particular. 


1 


The  category  of  retracts  ] 

It  may  be  helpful  to  look  at  retracts  from  a categorical  point  of  view.  The  retracts  of  I form  a J 

category  R in  the  following  way:  1 

- The  objects  of  R are  the  retractions  in  I.  j 

- A functions  f < 1 is  a morphism  from  the  retraction  r to  the  retraction  i iff  f o r ■ $.  I 

- Composition  of  morph  isms  in  R is  just  composition  of  functions  in  1.  j 

Obviously.  R is  a category.  Note  that  two  functions  f and  e in  1 will  be  identical  morphisms  in  R if  I 

they  agree  on  their  source  domain  (“source  retract"),  i.e.  if  i^r  « jor.  An  identity  on  a retract  r is  a ' 

function  F with  For  ■ r.  We  write  Id,  for  the  identity  on  r.  1 

i 

Let  r,  s be  retracts.  D,  is  called  subdomain  of  D,  iff  sor  • r , i.e.  iff  the  fixed  points  of  r are  also  1 

fixed  points  of  s.  0,  is  called  retract  of  D,  iff  *or  • r and  ro*  i r . 

A particular  retract  is  the  truth  value  domain  T.  Trivially,  the  universal  domain  I is  also  a retract. 

However,  the  property  of  being  a retract  cannot  be  proved;  the  corresponding  retractions 

Ti  [Xx.  xatt,  ff] 
and 

1 • t«<J.  T u (J  -►  J)] 

are  rather  part  of  the  axiomatization  of  tfLCF.  Obviously,  "retract"  and  "retract  of  I"  mean  the 
same  thing. 

It  should  be  noted  that  R is  not  the  category  of  those  subdomains  of  1 that  are  defined  by  retracts 
different  retractions  can  define  the  same  domain  but  will  be  different  objects  in  R.  For  example,  the 
retractions  T and  istru*  (see  appendix)  both  define  the  domain  {x.tt.ff.x}  but  are  completely  different 
functions.  However.  T and  istru*  are  isomorphic  in  the  category  R.  Incidentally,  if  two  domains  are 
isomorphic  one  of  them  need  not  be  a retract  of  the  other:  for  instance,  it  is 

T o istruc  ■ istru*  and  istru*  o T ■ T 

i.e.  T and  istrua  are  subdomains  of  each  other,  but 

ittrue  o T > istruc  and  T o istru*  P T 


I 

i 


The  type-free  Logic  for  Computable  Functions 


Thus  neither  of  T and  iitru*  is  a retract  of  the  other.  This  discussion  shows  how  retractions  that 
define  the  same  domain  are  related  in  the  category  R: 

Corollary  2.1:  For  retractions  r and  «,  if  D,  ■ D,  then  r and  c are  isomorphic,  that  is,  from  a 
structural  point  of  view,  they  cannot  be  distiguished. 

The  category  R has  many  useful  closure  properties  we  are  going  to  exploit. 

Lemma  2.2:  R is  closed  under  ♦,  * and  •*,  i.e.  if  a and  b are  retracts,  then  so  are  •♦b,  axb,  and  i-^b. 

Proof:  by  T.P. 

Lemma  2.3:  R is  cartesian-closed. 

Proof:  We  have  to  prove  that  [r  -*  -» »]]  and  [rx*  -♦  i]  are  isomorphic  in  R for  any  retracts 

r,  s and  t.  Let  F,  G be  defined  by 

F ■ [X  f X.  f(n|  x)(n2  x)] 

G ■ [X  I r t.  g«r,a»] 


The  T.P.  proofs  for 


are  almost  straightforward. 


G o F ■ W(R^[s^P)). 
F e G ■ Id(R,s  ^ pj 


Lemma  2.3  is  the  basis  for  what  is  commonly  called  "currying".  It  allows  to  restrict  attention  to 
monadic  functions. 

Let  the  function  ctr  be  defined  by 

sir  :■  [Xf  X.  d(x)  a f(x),  X] 

By  T.P.  wc  can  show  that  fir  turns  any  function  into  one  that  is  strict  (with  respect  to  the  first 
argument)  and  that  it  is  a retract.  This  shows  that  the  set  of  strict  functions  is  a proper  subdomain 
of  I. 


A domain  is  called  flat  if  it  contains,  besides  x and  T,  only  pairwise  Incomparable  elements.  For  flat 
domains  there  Is  a computable  equality  relation  with; 

d(x)  ■ tt,  d(y)  ■ t,  X ■ y I*  xsy  ■ tt 
x«y  ft  I*  X ■ /i  d(x)  ■ it,  d(y)  ■ it 
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1 


In  many  cases  it  is  very  convenient  to  use  the  (computable)  equality  instead  of  the  equivalence  ■ 
since  it  may  appear  inside  a term  and  thus  gives  greater  expressive  power. 

Lemma  2.4:  If  F ■ [*<f.  [Xe.  I|f(e))]]  and  G ■ [Xe.  (.cf.  t{|)]] , where  \ is  any  term,  then  F ■ G. 

Proof:  T.P. 

Essentially,  the  lemma  means  that  constant  parameters  can  be  bound  "globally",  i.e.  they  need  not  be 
passed  on  with  every  call.  The  lemma  will  be  used  quite  often  in  the  remainder  of  the  paper 
without  being  referred  to  explicitly. 
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3.  Abstract  Data  Types  in  the  Type  Free  Logic 

In  this  section  we  introduce  data  types  and  discuss  the  representation  of  data  structures  in  tfLCF. 
We  investigate  properties  of  data  types  by  looking  at  them  fiom  a more  algebraic  point  of  view, 
which  allows  us  to  derive  various  function  definition  schemes.  The  basic  function  definition  method 
is  illustrated  in  an  example  dealing  with  the  translation  of  arithmetical  expressions  from  infix  to 
postfix  form. 

What  IS  intuitively  meant  by  the  notion  abstract  data  type?  There  is  a common  understariding  that, 
in  programming,  a data  type  is  not  just  a set,  but  also  comprises  information  r.bout  the  structure  of 
the  elements  and  how  to  construct  them  and  to  operate  on  them.  This  can  be  done  in  an  abstract 
way,  i.e.  the  only  information  available  is  the  set  of  primitive  operations  (constructors,  selectors, 
recognizers)  and  relationships  between  them;  it  does  not  matter  what  the  elements  of  the  type  look 
like  and  how  the  primitive  operations  are  implemented.  In  the  context  of  a formal  calculus  the 
relationships  between  the  primitive  operations  are  expressed  by  axioms. 

The  presentation  concentrates  on  generic  recursive  types;  however,  in  subsection  3.4  extensions  to 
non-free  types  are  discussed. 


3.i  Data  Type  Definitions 

We  start  with  discussing /r«  data  types.  The  type  system  will  be  extended  later  to  comprise  a wider 
class  of  types.  A type  definition  is  made  by  listing  alternative  subtypes.  A subtype  is  either  a constant 
or  a composed  type.  Composed  data  types  are  defined  best  by  their  abstract  syntax  [Me],  using 
constructors,  selectors  and  recognizers  to  describe  the  structure  of  the  type.  In  a more  formal  BNF- 
like  notation  (using  "constr"  for  constructor,  "sel"  for  selector,  and  "dt"  for  data  type): 


<type_def>  ♦-  <type_name>  <subtype>  { | <subtype>  j-;' 

<subtype>  ♦-  <constant>  | <comptype> 

<comptype>  «-  <constr>  ( <sel|>:<dt)>,.....<$el„>:<dt„>  ) 

<constant>  ♦-  <identifier> 

with  the  restriction  that  the  names  of  all  constructors  in  a type  definition  and  all  selectors  in  one 
composed  type  have  to  be  distinct.  A data  type  definition  may  be  recursive,  that  is,  any  of  the  dt|  in 
a subtype  may  be  the  name  of  the  type  to  be  defined.  Also  mutually  recursive  data  type  definitions 
are  permitted. 

For  example,  the  data  type  "sequence  (linear  list)  of  atoms"  can  be  defined  using  this  formalism  by 
Seq  :«  amptycaq  | mKcflq(hd:atom,  thSaq). 

Strictly  speaking,  this  data  type  definition  is  a type  scheme,  that  is, 
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scq  :■  •mptysaq  | mks«q(hd:cltyp«,  tl:s«q) 

defines  a type  "sequence  of  elements  of  type  dtype”  for  any  data  type  dlype.  This  will  be  made  more 
precise  in  the  following  subsection,  Beside  s«q  we  will  use  other  standard  data  types  (type  schemes) 
like  binary  trees,  natural  numbers,  and  pairs,  defined  by 

bintrae  :■  mKbt(tub:dlyp«)  | comp(fir:binlro,  tactbiniraa) 

nnum  :■  zero  | tuc(nn:nnum) 

dpair  :■  mkpair(fir:dtyp«|,  lactdiypaj) 


3.2  Representing  Data  Structures  in  tfLCF 

In  section  2 it  was  explained  that  retracts  can  be  regarded  as  the  "types"  in  the  type-free  logic.  The 
data  types  are  now  to  be  represented  in  LCF  in  such  a way  that  the  resulting  terms  are  retractions. 

What  exactly  is  implied  by  a data  type  definition?  Intuitively,  a data  type  should  have  the  following 
properties. 

a)  A data  type  is  the  disjoint  union  of  subtypes  A subtype  is  either  a constant  or  a composed 
subtype.  For  each  subtype  there  is  a predicate  (characteristic  function)  which  will  be  named 
"is_const"  or  "is_<constr>"  resp.  Tliese  recognizers  permit  to  decide  membership  in  one  of  the 
subtypes  relative  to  the  whole  data  type. 

b)  Each  constructor  is  a one-to-one  function;  in  particular,  the  corresponding  selector  functions  allow 
to  "retrieve"  the  respective  arguments  of  a constructor. 

c)  A subtype  has  to  be  embedded  explicitly  into  the  type  by  a constructor  function.  For  example, 
"atoms”  are  not  lists  unless  they  are  "converted"  into  lists.  This  helps  us  to  keep  all  data  types 
disjoint. 

These  statements  can  be  expressed  more  precisely  in  terms  of  LCF  axioms. 

Definition  ?.I  (Axioms  for  -ric  data  types): 

The  data  type  definition 

type  :■  constant]  | . . . | constant^  | comptyp# ; I . . . | comptypon 

with 

comptyp*],  :«  comp),(s«l),|:dt|,|, ...  ,s*l|,j),:di|,]],)  for  kBl,...,n 

is  considered  to  correspond  to  the  fixed  point  equation 

(1 ) typo  ■ [u  f.[\  X.  ii.con*tant](x)  ax,..  ,ia_con8tant„(x)  a x, 

is_comp](x)  a comp |(dtjj (sol n(x)), . . ,dt|j|(sol|j|(x))), . . , 
is.comp„(x)  a comp„(dtn,(sol„,(x)), . . ,dtnj„(solnj„(x))), 

]] 

where  dl'jjj'F  if  dl,,j"lypo  and  dtVfdtKj  otherwise,  and  for  i,j»l,..,m;  k,l»l,..,n 
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(2) 


13) 


14) 


(5) 


d(constantj) 

is_constant|(constantj) 


■ tt 

■ tt 


d(eomp|<{dt,,ilxi),  . . .dt^j^lxj^)))  a d(dt„it<,))  A . , a d(dt,(j,((Xj^)) 
d o eompi  « is.comp,  o compj 


Vx.  is_comp^(x)  ■>  is_comp|(x) 
Vx.  is_const,|x)  «>  is_constj(x) 


1 ff 
1 ff 

Vx.  is.eonstjlx)  ■>  is_comp„|x)  » ff 


d o is.comp), 
i o ii.constj 
i o is.constj 


ado  is_comp| 
ado  is.conatj 
ado  ia.compi, 


(6)  d{eompj(x,,...,x„|))  »>  sal j, (comp, (X|,..,.<„j))  a x, 


for  Ml 
for  i/j 


for  r«l,..,nj 


Axiom  (1)  is  a mere  transcription  of  the  type  definition.  It  contains  the  basic  information  about  the 
type  structure;  therefore,  it  will  be  called  the  characterizing  function  of  the  type.  The  goal  is  to  prove 
that  it  is  a retract.  However,  this  cannot  be  done  without  further  specifying  the  primitives  occurring 
in  (1)  by  adding  axioms  expressing  the  statements  a)-c).  They  make  sure  that  the  recognizers  for  the 
subtypes  are  complementary  (axiom  4)  and  tliat  they  are  defined  exactly  for  the  elements  of  the  type 
(axiom  5).  Axioms  no.  6 state  the  generic  nature  of  the  type  (This  is  equivalent  to  saying  that 
constructor  and  selector  functions  are  essentially  tupling  and  projections).  All  constructors  are 
assumed  to  be  strict  in  each  argument  and  total  for  arguments  of  correct  types  (axioms  3). 

Example:  The  data  type  definition  for  (homogeneous)  sequences 

S«q  emptyiaq  | mki«q(hdiAtom,  tUSaq) 

will  generate  the  axioms 

(51)  saq  ■ [•<$.  [kx.  is.amptysaq(x)  > amptysaq, 

is_mkiaq(x)  s mkaaq(atom(hd(x)),  S(tl(x))), 

■L]] 

where  atom  is  the  characterizing  function  for  the  data  type  Atom, 

(52)  d(amptysaq)  ■ tt 

(53)  is_amptysaq(amptysaq)  ■ tt 

(54)  Vx  y.  d(mksaq(atom(x),  saq(y)))  • d(atom(x))  A d(saq(y)) 

(55)  do  mktaq  ■ is.mksaq  o mksaq 

(56)  is.amptysaq  ■ •<  o j|_mksaq 

(57)  Vx  y.  d(mksaq(x,y))  ■>  hd(mksaq(x,y))  ■ x 

(58)  Vx  y.  d(mk$aq(x,y))  *>  tl(mksaq(x,y))  • y 


Any  type  t;  that  occurs  in  the  definition  of  another  type  tj  is  considered  a base  type  for  t..  The  notion 
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"base  type"  does  not  imply  "basic"  or  "simpler",  on  the  contrary,  since  mutual  dependence  of  data 
type^  is  permitted,  tj  itself  may  be  a base  type  for  t,  (hence  the  relation  "is-base-type-of"  is  only  a 
quasi-order).  If  data  types  are  mutually  dependent  the  corresponding  characterizing  functions  forn 
a system  of  mutually  recursive  functions,  Those  base  types  that  do  not  depend  on  the  type  to  be 
defined  are  called  generating  types  (in  fact,  they  generate  the  type  in  an  algebraic  sense;  see  the 
following  sub-section). 

At  this  point  it  has  to  be  clarified  what  it  means  for  a characteristic  function  to  be  a retraction.  The 
logical  type  of  typ«  is  (typ«->typ«) . In  order  to  make  it  a function  in  I the  primitives  (constants, 
constructors  and  selectors  etc.'  have  to  be  specified  as  elements  of  1.  This  amounts  to  defining  a model 
of  (the  axioms  describing)  the  data  type  in  I.  However,  the  retraction  property  can  be  proved  just 
from  the  axioms;  more  precisely,  what  can  be  proves  is  that  lyp«  is  an  "abstract  retraction",  meaning 
that  every  model  is  a retraction.  All  the  models  will  be  isomorphic  retractions  (in  the  categorical 
sense).  Thus  the  abstract  retraction  represents  an  equivalence  class  of  objects  in  R. 

It  should  be  noted  that  the  standard  representation  of  data  types  in  a LCF-like  language  is  by 
"domain  equations"  (involving  ♦ and  x ; see  [Sc2]).  For  example,  the  data  type  S«q  of  sequences  of 
atoms  is  completely  specified  by  the  least  fixed  point  of  the  equation 

(i)  S«q  ■ emptyteq  :*  (Atom  :x  S«q) 

(where  :♦  and  :x  are  strict  versions  of  ♦ and  x)  given  a representation  of  the  type  Atom  and  the 
constant  omptysoq.  However,  we  do  not  follow  this  line.  The  syntax  of  data  type  definitions,  as  given 
in  the  preceding  section,  involves  constructors  and  selectors;  they  are  the  primitives  for  defining 
functions  operating  on  the  data  type.  But  they  do  not  appear  in  (i).  in  fact,  (i)  is  an  implementation 
of  the  data  type  Soq  (by  functions  in  1)  rather  than  an  abstract  definition;  the  primitives  are  hidden 
in  the  construction  of  sum  and  product.  In  order  to  keep  the  previous  higher  level  of  abstraction 
the  primitives  have  to  be  axiomatized,  as  has  been  done  for  special  types  in  [Nel].  Although  an 
axiomatization  as  in  definition  3.1  is  by  far  less  elegant  than  a definition  like  (i)  it  is  more 
appropriate  for  the  purpbse  of  program  specification. 

Incidentally,  this  discussion  shows  that  every  (generic)  data  type  has  the  following  standard  model: 
The  data  type  definition 

type  :■  constinti  | . . . | constant,,,  | comptype,  | . . . | comptype^ 

with 

eomptype^  compK(8elk|:dt^„  ...  .sei^jkSdtk^k)  for  k.l,.,.,n 

is  simply  tianslated  into 

type  comp;  eompp 

with 

eompi,  :■  (dtm  :x  . . jx  dt(,j^)  for  k»l,...,n 

It  is  easy  to  figure  out  what  the  primitives  look  like,  and  the  proof  that  the  axioms  of  definition  3.1 
hold  IS  straightforward. 
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Lemma  S.2:  typ*  is  strict. 

Proof:  Simple  consequence  of  the  fact  that  all  functions  involved  are  strict  (by  definition). 

Theorem  S.S:  typ*  's  a retract  if  all  base  types  of  typ*  are  retracts. 

Proof;  See  appendix  A. 3. 

From  the  characterizing  function  of  a type  we  are  going  to  derive  a variety  of  function  definition 
schemes  and  functions.  !n  particular,  a type  predicate  (characteristic  function)  isjyp*  can  be  derived 
which  yields  tt  exactly  for  the  defined  elements  of  the  type,  i.e.  which  satisfies: 

Vx.  isjyp* (x)  ■>  typ*(x)  ■ x 
Vx.  isjyp*  U)  ■>  d(x)  ■ tl 
and 

typ*(x)  ■ X,  dix)  ■ tl  1-  iijyp*(x)  ■ tt 
This  predicate  will  be  discussed  in  greater  detail  in  the  following  subsection. 

For  a recursive  data  type  we  can  derive  the  standard  structural  induction  rule  from  the 
characterizing  function  (the  retraction). 

Theorem  3.4  (Structural  Induction): 

For  a recursive  data  type  typ*  defined  by 

typ*  :■  constant]  | . . . | constant^  | comptyp*]  | . . . | comptyp*„ 

tuith 

comptyp*^  5"  compj8*lm:dt|<„  ...  ,8*l|,j(,:dt^j,,)  for  K«1 n 

there  is  an  induction  rule  that  allows  to  conclude 

Q 1-  Vx.  i*Jyp*(x)  ■>  P(x) 

(where  the  conclusion  is  meant  to  be  a wff  involving  x each  of  whose  awf*s  is  prefixed  by 
Vx.  iajyp*(x) . . ) from  the  following  antecedents: 

(a)  For  each  constant  eonatantj  (i«l,..,m) 
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(b)  For  each  composed  subtype  compiype^  (k»l  ,-.n) 

luith  an  antecedent  H(yKjp)  f>r  each  recursion  arguments,  i.e.  for  those  arguments  of  comp^  luith 
dtki  ■ *yp*  • 

Proof:  see  appendix  A. 3. 

For  example,  the  induction  rule  corresponding  to  the  characterizing  function  s«q  is 

P(«mptyi*cj)  P(y).  d(mks«q(x,y))itl  |-  P(mks«q(x,y» 

Vx.  ii_s«q(x)  •>  P(x) 

Note  that  the  constructor  arguments  in  th*  induction  step  need  not  be  restricted  by  type  predicates 
(the  restriction  is  implied  bv  the  definedness  predicate),  A discussion  of  other  forms  of  the  induction 
rule  that  involve  the  retraction  can  be  found  in  appendix  A.3. 

As  mentioned  ibove.  the  t /pe  definition  for  leq  is  a type  scheme,  defining  a data  type  for  any  type 
dtype.  This  means  that  in  the  corresponding  retraction  s«q  the  retract  ato-n  can  be  replaced  by  any 
other  retract.  We  therefore  can  dc^ne  the  functional 


i^qol  «■  [X  typa-  [*<S.  [Xx. 


ii.«mptya«q(x)  » amptysaq, 

it  mks«q(x)  » mkt«q(typ*(hd(x)),  S(tl(x))), 


By  theorem  3.3  a«qoi(typ*)  is  a retract  for  any  retract  typa.  In  other  word,  ttqof  maps  retracts  on 
retracts.  Obviously,  any  generic  type  construction  yields  such  a mapping  on  retracts.  Properties  of 
these  functionals  will  be  studied  in  the  following  subsection. 


3 3 Algebraic  Interpretation  of  Data  Types 

Interpreting  data  types  in  terms  of  universal  algebra  helps  to  clarify  certain  concepts  and  properties. 
As  a data  type  may  involve  several  subtypes  and  functions  of  heterogeneous  type  the  appropriate 
notion  is  that  of  "heterogeneous  algebra"  (Birkhoff  and  Lipsom  [BI];  see  also  Higgins  [Hi]). 

Definition  {Heterogeneous  algebra) 

A heterogeneous  algebra  A consists  of 
- a family  (Aj)j«j  of  (non-empty)  sets;  the  Aj  are  called  phyla. 

a family  operations;  for  each  f|,  there  is  an  associated  tupel 

indiXk  ■ (ikii  • • liknk*  iKnk«i)  0^  elements  of  J. 


Ahicract  Data  Types  in  the  Type  Free  Logic 


The  index  of  the  operation  f|,  indicates  the  phyla  from  which  the  arguments  of  fi,  are  taken  (i.e.  the 
argument  types)  and  the  target  phylum,  nk  is  the  arity  of  (possibly  0).  In  the  present  framework, 
the  index  is  simply  the  type  of  the  operation. 


The  triple  (J,K,  (index^)^,K)  ‘s  called  the  signature  of  the  algebra;  it  characterizes  the  basic  structure. 
Algebras  are  called  similar  if  they  have  the  same  signature.  As  the  structure  of  similar  algebras  is 
comparable,  it  is  possible  to  define  structure-preterving  mappings  between  them. 


Definition  3.6  (Homomorphism) 

Let  A = ((A,)„j.  and  B = (gx)kcK  be  similar  algebras.  A homomorphism  h from 

A to  B is  a family  of  mappings  (hj),<j  such  that  hj  maps  Aj  into  B,  and  for  each  k<K 


bjnk*  l^*kl*it  ■ ■ lank))  * SkJb)i(«|)i  • • (hjnklank)) 


where  ind«X|,  * (ji,  . . Jnk!  ink.i)  • 


As  mentioned  in  the  previous  section,  a data  type  (or;  its  domain)  forms  a compl.te  lattice.  So,  the 
appropriate  algebraic  structure  is  that  of  a heterogeneous  lattice-algebra.  Although  the  lattice 
structure  of  the  domains  considered  here  is  not  very  interesting  - apart  from  the  elements  X and  T 
the  domains  are  fiat  - we  have  to  take  it  into  account  by  requiring  that  all  functions  preserve  the 
lattice  structure.  However,  because  of  the  simple  structure  it  is  sufficient  to  require  that  all  functions 
involved  in  the  algebraic  structure,  i.e.  the  operations,  are  strict  and  total.  Similarly,  all  the 
mappings  h,  constituting  a homomorphism  have  to  be  strict  (it  is.  however,  not  necessary  to  assume 
totality).  Hencefor'.h,  these  assumption  will  be  made  throughout  the  remainder  ot  the  paper. 


Example.  The  data  type  S*q,  regarded  as  a heterogeneous  algebra,  consists  of  the  two  phyla  atom  and 
s«q  and  operations 


•mptystq: 

mksaq: 


0 -♦  seq 

atomxseq  -♦  seq. 


(nullary) 


i 


The  axioms  in  the  previous  section  indicate  that  a data  type  corresponds  to  an  absolutely  free  (or 
"generic")  algebra  which  is  generated  by  the  constants,  the  base  types  and  the  constructor  functions 
as  operations.  As  it  is  well-known  in  algebra,  an  absolutely  free  algebra  has  characterizing  universal 

properties: 

(1)  There  is  (upto  isomorphisms)  only  one  absolutely  free  algebra  for  given  generating  base  types  and 
operations. 

(2)  Any  homomorphism  from  an  absolutely  free  algebra  F into  another  algebra  A of  the  same  type  (in 
the  algebraic  sense)  is  determined  uniquely  by  functions  mapping  the  generating  sets  into  A. 

Properties  (1)  and  (2)  can  be  proved  in  LCF  for  each  (free)  data  type  without  relying  on  an 
algebraic  interpretation.  It  is  these  properties  we  are  going  to  exploit. 
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In  order  to  define  a homomorphism  it  is  sufficient  to  map  the  base  types  into  target  sets  and  the 
constructors  onto  operations  on  the  target  structure.  Then,  by  property  (2)  there  is  a unique  function 
that  homomorphically  extends  the  base  function(s).  Due  to  the  fact  that  in  LCF  homomorphisms  can 
be  "pushed  through"  conditionals,  homomorphic  extension  is  representable  by  a simple  modification 
of  the  function  characterizing  the  data  type,  we  have  only  to  replace  the  base  type  retracts  by  the 
base  functions  and  the  constructors  by  the  operations  on  the  target  algebra. 

To  continue  our  example  based  on  the  data  type  s*q,  we  notice  that  a homomorphism  from  saq  into 
an  appropriate  algebra  is  determined  completely  by 

a)  a constant  that  is  the  image  of  emptyseq, 

b)  a function  that  maps  the  base  type  atom  into  the  corresponding  set,  and 

c)  a binary  operation  on  the  target  algebra. 

In  LCF,  this  is  written  as  the  functional 

Stun  :■  [X  f const  op.  [«c  S.  [Xx.  ii_«mptys«q(x)  » const, 

is_mks«q(x)  =>  op(f(hd(x)),  S(tl(x))), 

]]) 

Assume  R i,'  the  target  structure  with  phyla  Rj  and  R2,  c an  element  of  Rj,  op  s Ri^Rj  Rj  a binary 
operation  .nd  fun  a function  from  atom  to  Rj.  Then  property  (2)  above  yields  the  following- 
theorem; 

Theorem  S.7:  F s«  Sfun(fun,c,op)  is  the  unique  homomorphic  extension  of  fun  with  respect  lo  c and  op, 
i.e.,  it  is  the  only  homomorphism  from  stq  to  R loith  F(omptystq)  ■ c and 

F o mKaoq  • [Xx  y.op(fun(x),F(y))]. 

The  proof  is  straightforward;  it  crucially  depends  on  the  "freeness"  of  the  type  definition  (i.e.  axioms 
(S7),  (S8))  which  is  necessary  to  establish  the  homomorphism  property  of  F. 

A simple  example  is  the  type  predicate  (characteristic  function)  for  stq;  The  function  is_8oq:  s#q-»T 
with 

is.s«q(x)att  iff  s«q(x)ix  and  d(x)att 

is  definable  simply  by  extending  the  type  predicate  ia.atom  of  the  generating  base  type  to  a 
homomorphism  into  T: 

is.soq  ■ SfL'n(i8_atom,tt,A) 

A further  property  of  the  homomorphic  extension  functional  Sfun  is  that  it  carries  over  a structural 
induction  rule  from  the  source  domain.  E.g  for  sequences: 

P(c)  P(y),  d(op(x,y))itt  |-  P(op(x,y)) 

V X.  ia.a«q(i)  ■>  P(F(x)) 


I 

I 
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thus  permitting  induction  on  (target)  domains  originally  not  structured  appropriately. 

The  mathematical  content  of  the  discussion  on  interpreting  data  types  as  absolutely  fiee  algebras 
and  the  homomorphic  extension  functionals  amounts  to  a well-established  fact  known  from  category 
theory;  the  correspondence  between  free  constructions  (free  objects)  and  representable  functors.  The 
pair  (Sdom.Sfun)  defines  a functor  from  R into  a subcategory  of  R of  "suitably  structured"  retracts. 

The  point  is  that  this  correspondence  can  be  established  within  the  framework  of  LCF.  Due  to  the  > 
fact  that  everything  is  represented  as  LCF  terms,  objects  and  morphisms  as  well  as  functors,  it  allows 
to  carry  out  mechanically  assisted  proofs  rather  easily.  For  the  time  being,  theorems  like  the  one 
mentioned  above  have  to  be  proved  in  LCF  for  each  data  type  separately,  although  the  structure  of 
the  proof  is  always  the  same.  However,  there  is  some  hope  that  formal  proofs  of  general  statements 
about,  eg.,  all  generic  data  types  v.ill  be  feasible  using  a metatheory  of  LCF  being  developed  on  the 
basis  of  representing  the  LCF  notions  as  data  types  (see  section  6 for  part  of  the  data  type 
definition). 

The  usefulness  of  homomorphisms  as  a structuring  principle  has  been  observed  elsewhere,  in 
particular  in  the  context  of  program  translation  [Mo,  MiW].  However,  though  homomorphic 
extension  is  a rather  powerful  schen  for  function  definition  it  is  by  far  not  powerful  enou'-h.  It 
turns  out  that  properties  similar  to  those  proved  for  homomorphic  extension  can  be  shown  *for  a 
more  general  class  of  definition  schemes;  this  will  be  discussed  in  section  5. 


I 3.4  Non-generic  Data  Types 

i Although  the  class  of  generic  data  types  covers  many  of  the  structures  needed  in  programming  it  is 

I not  comprehensive  enough.  Pelaxiug  the  restriction  to  generic  structures  is  tantamount  to.  in 

algebraic  terms,  allowing  to  add  i rther  relations  to  a type  definition.  In  a way,  the  generic  data  types 
can  be  regarded  as  the  "context  free  types,"  and  adding  relations  as  "introducing  context."  In  the 

context  of  this  paper  it  is  sensible  to  consider  only  relations  that  are  expressible  as  recursive 
predicates. 

I 

i The  general  method  will  be  discussed  by  means  of  an  example.  Let  noreplx)  be  a predicate  on 

I sequences  which  is  true  iff  x does  not  contain  repetitions  of  elements  (the  explicite  definition  is 

I ^ straightforward).  Then  the  data  type  nor«pt#q  of  "sequences  without  repetitions"  is  just  the 

I restriction  of  t«q  by  norep.  The  new  type  can  be  represented  in  the  following  way;  Whenever  an 

! element  is  added  to  a sequence  it  is  checked  first  if  it  already  occurs  in  it.  in  which  case  nothing  is 

^ done.  That  is,  if  the  constructor  mkseq  is  modified  to 

mknor*ps*q  :•  [Xx  y.  nor*p{mks«q(x,y))  a mks*q(x,y),  y] 

I sequences  constructed  by  mknor#pseq  have  the  "no-repetition"  property,  i.e.  they  satisfy  the 

! predicate  norep.  This  is  just  another  application  of  the  homomorphic  extension  functional:  The 

I • range  of  the  function 

I I 

I 
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norepseq  :•  Sfun(dtyp*.  emptyseq.  mHnor«pseq) 

IS  exactly  the  desired  subset  of  seiuences  without  repetitions  (it  is  obvious  that  only  elements  of  seq 
are  constructed)  In  other  words,  rorepseq  represents  ordered  sets  of  elements  of  type  dtype. 
Obviously,  norepseq  is  a retract  of  seq;  Since  norepseq  defines  a subdomain  of  s«q  it  is  also  a retract 
of  1 which  means  that  norepseq  makes  sense  (in  the  present  context)  as  a data  type.  This 
construction  for  new  retracts  works  at  least  in  the  case  where  a new  type  is  defined  by  a restrictive 
oredicate  The  full  extend  of  the  method,  however,  needs  to  be  explored  further.  It  is  conjected  that 
Lv  data  type  (given  a reasonable  definition  in  terms  of  computability)  is  representable  as  a retract  of 
a generic  typ'  this  would  parallel  the  fact  that,  in  formal  language  theory,  any  recursively 
enumerable  set  is  the  image  of  context  free  rets  under  suitable  mappings. 


3 5 An  Example:  Infix  to  Post-fix  Translation 

A,  an  example  we  show  how  to  eeneraie  a function  that  translates  arithmetical  expressions  from 
infix  to  postfix  notation  (the  example  was  susgestecl  by  J.Allen),  The  abstract  syntax  of  the 

Structures  is  defined  by 

:■  mkt«xp(te:t*rm)  1 mksexp(su,:oxp,  suziterm) 

;■  mkft«rfr\(tf:fact)  | mkpUrm(pr,:Urm,  pr2ifact) 

:■  mkvtaet(fv:ver)  1 mk«feeU<#t*xp) 

:■  mkvpo*t(pv:var)  | mk8um(*itpo*t,  sziposl)  1 mkprod(p|:po*t,  P2ipo*0 

which  may  be  thought  of  as  abstraction  from  the  "concrete"  infix  grammar 

<«xp>  <exp>  <!«'•'«>  I 
<tarm>  <t*rm>  'x  <fact>  | <fact> 

<fact>  :■  <var>  | '{  <«xp>  ') 

and  the  nostfix  grammar 

<post>  :■  <var>  | <post>  <post>  '«  | <post>  <post>  'x 

Now.  the  problem  is  find  a function  that  translates  variables  into  variables  and  infix-sums  and 
mfix’products  into  postfix-sums  and  postfix-products  resp.  This  is  a simple  example  of  a 
homomorphism  between  heterogeneous  algebras.  The  algebra  Exp  includes  the  4 phyla  .xp  term,  fact 
and  var.  the  algebra  Post  the  phyla  post  and  var.  The  homomorphism  maps  exp,  term  and  fact  into 
the  phylum  post  and  var  onto  var.  that  is,  the  homomorphism  consists  of  4 mappings 

id:  var  -♦  var 

Texp:  exp  -♦  post 

Tterm:  term  -*  post 

Tfact:  fact  -*  post 


exp 

term 

fact 

and 

post 
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These  mappings  have  to  respect  the  corresponding  algebraic  operations 


mKs«xp:  axpxtcrm -»  axp  <->  mksum  : post xpoit -»  post 

mkptarm:  tarmxfsct  -»  tarm  <->  mKprod:  postxpost  -♦  post 

mkvfsct:  vsr  -»  f*ct  <->  mkvposti  vir  ->  post 


i.e.,  they  must  satisfy  equivalences 

Taxp(mksaxp(x,y))  ■ mksum(Taxp(x),  Ttarm(y)) 


etc.  Since  the  distinction  between  axp,  tarm  and  fact  disappears  in  Post,  the  "operations" 
corresponding  to  mktaxp,  mkftarm  and  mkafact  are  just  identities  on  post.  Having  established  all  the 
algebraic  corresj  ondences,  homomorphic  extension  immediately  yields  the  desired  functions  (slightly 
simplified): 


Taxp  • [«<E.  [Xx.  is_taxp(x)  s Tterm{ta(x», 

is_saxp(x)  3 mksum(E(sU|(x)),  Ttarm(su2(x))), 


Ttarm  * [ocF.  [Xx.  is.ftarm(x)  a T(act(tf(x}}, 

isj>tarm(x)  s mkprod(F(pr,(x)),  Tfict(pr2lx))), 


Tfact  E [Xx.  is.vfact(x)  a mkvposl(var((v(x))), 
is.afact(x)  3 Taxp(fa(x)), 


fi 
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4.  Elements  of  a Problem  Specification  Language 

This  section  is  devoted  to  discussing  a rudimentary  "problem  specification  language."  The  language 
consists  of  the  terms  of  typed  LCF,  augmented  by  certain  constructions  that  are  considered  natural 
or  helpful  for  concise  specification  of  problems  or.  more  precisely,  functions  over  data  types.  The 
main  extension  is  a first-order  like  calculus  that  enables  to  talk  about  sets  and  quantification  in  a way 
consistent  with  the  computational  logic,  is  an  extension  of  the  LCF  terms  in  their  typed  form.  Using 
.he  definition  techniques  developed  in  the  preceding  section,  the  added  constructions  are  interpreted 
as  LCF  terms  which  gives  them  the  intended  meaning  as  computation  rules  or  "programs." 


4.1  Sets,  Set  Operations  and  Quantification 
Syntax 

Types.  The  language  is  tyoed,  i.e.  a type  is  associated  with  each  term.  There  is  a predefined  type; 
T,  the  domain  of  truth  values.  New  types  can  be  defined  explicitly  as  data  types  (see  below).  For 
each  type  t we  have  a type  s*tof(t)  denoting  the  powerset  type  "sets  of  elements  of  type  t".  More 
formally. 

Definition  4.1  (Types): 

(1)  T is  a type. 

(2)  Data  types  are  types. 

(3)  If  t|  tnd  tj  are  types,  then  is  a type  (the  type  of  functions  from  tj  to  tj). 

(4)  If  t is  a type  then  s*tof(t)  is  a type. 

(5)  These  are  all  the  types. 

Types  built  by  (4)  are  called  set  types.  No  data  type  is  a set  type.  Although  types  are  not  sets,  we  use 
the  type  name  also  to  denote  the  set  of  individuals  of  that  type.  There  are  no  equalities  between 
types;  different  type  expressions,  in  particular  different  type  names,  denote  different  types. 

Note  the  distinction  between  "types"  and  "data  types".  Types  are  the  sorts  in  the  logic,  whereas  the 
notion  data  type  is  used  more  in  the  sense  of  data  types  in  programming  languages  which  involves 
certain  assumptions  about  the  (internal)  structure  of  the  typed  objects.  By  (2)  in  definition  4.1  data 
types  are  assumed  to  coincide  with  certain  logical  types. 

Terms.  We  use  the  notation  sd  to  denote  a terrri  « of  type  t.  If  t is  a set  type  then  td  is  called  set 
term.  All  LCF  terms  (1  - 6)  are  terms  of  our  language  (cf.  [Mil]  and  appendix  A. 1.1).  Beside  the 
LCF  terms  the  language  includes  terms  for  expressions  involving  sets  and  bounded  quantification 
(7  - 9). 


Definition  4.2  (Terms): 

(1)  The  constants  x,  tl,  ff,  T are  terms  of  type  T. 
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(2)  Any  identifier  is  a term. 

(3)  If  R!ti-*t2  and  x:t,  are  terms  then  t(x):»2  is  a term. 

(4)  If  x:t,  IS  an  identifier  and  s:t2  is  a term  then  [>.x.*(x)]!  t,-»t2  is  a term, 

(5)  If  p:T,  q.rit  are  ternis  then  (p=q,r):l  is  a term. 

(6)  If  x:t  is  an  identifier  and  *:<  a term  then  [ocx.«]!f  is  a term. 

(7)  Ii  x:t  is  an  identifier  ano  S^setoUt)  a set  term  then  (x(S)iT  is  a term. 

(8)  If  x:t  is  an  identifier,  S:s«*oflO  a set  term  and  P:t-»T  a predicate  term  then  (Vx<S.  P(x)):  T 
and  Px<S.  P(x)):  T are  truth  value  terms. 

(9)  These  are  all  the  terms 

As  usual  parentheses  and  brackets  can  be  omitted  as  long  as  parsing  is  unambiguous.  The  notions 
atuff  and  toj  are  used  as  in  typed  LCF  (see  appendix  A.  1.1). 

Note  that  the  use  of  the  sign  V for  quantification  in  (8)  cannot  be  confused  with  the  use  of  V in 
abbreviations  for  Xx.t*Xx.s.  The  former  ahoays  requires  a restricting  set  whereas  the  wff-V  is  never 

restricted. 

Semantics  j w 

The  aim  is  to  interpret  the  extended  typed  language  in  the  type  free  calculus.  This  is  done  by 

showing  that  every  type  corresponds  to  a retract  in  tfLCF.  Since  the  representation  of  data  types  as 
retracts  has  already  been  discussed,  it  remains  to  show  how  sets  are  to  be  represented.  Based  on  the 
set  representation  we  then  have  to  find  interpretations  for  the  set  operations  and  quantifications. 

The  most  common  way  of  introducing  sets  into  an  environment  of  structures  is  by  representing  them 
as  sequences  (linear  lists)  of  non-repeating  elements.  As  we  are  not  interested  in  axiomatizing  set 
theory  but  rather  look  for  convenient  definition  of  function  meanings  we  rely  on  such  a 
representation  in  LCF  (cf.  [Nel]).  It  will  turn  out  later  that  sets  are  needed  mainly  as  a conceptual 
intermediate  step  which  can  be  eliminated  in  actual  programs  . Besides,  representing  sets  by 
sequences  fits  nicely  into  the  algebraic  framework.  Actually,  what  is  to  be  represented  is  a rather 
restricted  kind  of  sets,  we  are  only  dealing  with  homogeneous  and  computable  (mainly  even  finite) 
sets.  However,  the  required  homogeneity  is  not  really  restrictive  as  one  can  always  define  the  "sum 

type." 

The  first  step  is  to  define  a membership  predicate  x<S  for  sequ-ences,  yielding  tt  if  x occurs  in  S and  ff 
otherwise  (if  it  is  defined).  It  is  definable  as  a homomorphism  from  S«q  into  T by  homomorphically 
extending  equality  on  atoms; 

< :*  [Xx.  Sfun([Xy.x«y]i  v)] 

Note  that  < is  defined  for  appropriate  types  only;  if  x docs  not  have  the  same  type  as  the  elements  of 
S,  » IS  undefined,  thus  also  x<S. 


Elements  of  a Problem  Specification  Language 


4 - 3 


Using  the  predicate  < a function  U|  on  atom  x soq  is  definable  by 
U|  :*  [Xx  S.  x<S  3 S,  mkseq(x,S)] 

U|  guarantees  that  elements  already  occurring  in  a sequence  will  not  be  added;  sequences  built  up 
using  U|  are  those  directly  representing  sets,  if  Soq  is  the  uaa  type  of  sequence^  of  elements  of  type 
t,  the  type  sotof(t)  is  the  image  of  Soq  under  the  homomorphism 

set  :»  Stun(id,  omptysaq,  U,). 

Moreover,  sot  is  a retract  on  soqj  it  defines  the  same  subdomain  of  soq  as  the  function  no  apseq 
discussed  in  subsection  3.4,:  sotoUt)  corresponds  to  the  subset  of  sequences  without  repetitions  of 
elements.  (However,  it  is  not  \ subalgebra  of  soq.)  From  this  it  IcMows  that  functions  defined  on  Seq 
are  equally  defined  on  sotot(t).  Furthermore,  the  (generic)  structure  of  Soq  can  be  used  for  defining 
functions  on  sotof(t).  More  specifically,  we  have  the  embedding  Isot:  sotof(t)  -♦  soq  with 

sot  o isot  ■ id,,)  ' 

Thus,  any  function  1:  soq  -♦  D ran  be  restricted  to  sot  by  composing  with  isot.  In  this  way,  the 
predicate  i defined  above  becomes  the  set-theoretic  element  relation.  Similarly,  we  obtain  an 
interpretation  of  quantified  terms  by  applying  homomorphic  extension  to  any  predicate  P.  Let 
operators  all  and  oxist  be  defined  by 

all  :*  [XP.Slun(P,  tf,  a) 

oxist  :*  [xP.S1un(P,  tf,  v) 

Then 

Vx<S.  P(x)  :=  alKP.S) 

and 

3x<S.  P(x)  oxist(P,S). 

Note  that  this  form  of  quantification  is  well-defined  if  S and  P are  defined;  since  J.  or  T is  never  an 
element  of  a set,  it  will  not  appear  in  quantification'  (and  cause  a non-defined  truth  value). 
Furthermore,  a quantified  term  denotes  a computable  function  if  the  predicate  P and  the  term 
denoting  the  restricting  set  S are  computable,  which  is  guaranteed  by  the  way  terms  can  be  built  up. 

Using  these  constructs  set  inclusion  is  easily  expressed  by 

Sj  c Sj  so  Vx<S|.  x<Sj 

and  similarly  set  equality  by  the  "extensionality"  property 

Sj  . S,  SI  (Vx<Sj.  x(Sj)  A (VxtSj.  x<Sj) 

Note  again  that  these  relations  will  be  undefined  for  sets  over  different  types.  The  empty  set  is  the 
image  under  sot  of  the  empty  sequence;  we  will  identify  the  former  with  the  latter. 
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The  function  U„  taken  as  a function  from  txsetofd)  to  inserts  a single  element  into  a set; 

extending  this  function  homomorphically  in  the  first  argument  yields  ordinary  set  union  u.  As  a 
short  hand  notation  we  will  use  1)^  f®’'  n-ary  union  (n-I-fold  composition  of  u). 

Similarly,  set  intersection  and  set  difference  are  defi-'\ble  by  means  of  the  function 
r«mov«:  U x*«tof(tl ) -♦  s«iof(il ) that  removes  an  element  froi  set.  remove  is  defined  by 


remove  :«  [Xx  s.  x<s  = rem(x,s),  s] 
where  rem  is  the  endomorphic  extension  of 
remi  :»  [Xx  y.  x«y  = {),  (y)]. 


If  { X I x<S  ) IS  used  as  an  equivalent  notation  for  S.  the  term  language  can  be  extended  to  include 
sets  that  are  characterized  by  predicates.  However,  one  has  to  be  careful:  a set  { x | P(x)  } need 
not  be  constructive  even  for  computable  P , if  no  domain  is  indicated.  Therefore,  predicates  for  set 
formation  have  to  be  restricted  to  those  based  on  set  expressions,  i.e.  elementary  predicates  >•<$.  All 
other  predicates  have  to  be  restrictive  in  the  sense  that  they  restrict  a set  to  a subset  ("filter 
predicates"). 


Definition  4.S  (admissibL  predicates): 

The  set  of  admissible  set  predicates  is  defined  by 

(1)  The  elementary  predicates  x<S  are  admissible  set  predicates. 

(2)  If  P is  an  admissible  set  predicate  and  Q any  predicate,  then  PaQ  is  an  admissible  set 
predicate. 

(3)  If  P and  Q are  admissible  set  predicates,  then  PvQ  and  P\Q  are  admissible  set  predicates. 


Lemma  4.4: 

If  P and  Q are  admissible  set  predicates,  then 


and 


{ X I P(x)  V Q(x)  ) ■ { X 1 P(x)  ) u { X I Q(x)  } 


{ X I P(x)  A Q(x)  ) « { X I P(x)  } n { X I Q(x)  } 


It  can  be  shown  that  the  operations  defined  here  have  most  of  the  standard  properties.  However,  the 
well-known  problems  caused  by  only  partial  recursive  predicates  still  remain.  For  example. 


- { Vx<S.  P(x) ) • 3x<S.  ' P(x) 

IS  true  only  if  P is  total  on  the  domain  under  consideration. 


It  is  obvious  that  the  representation  of  sets  and  set  operations  provide  a model  for  a theory  of  (finite) 
sets.  In  particular,  a first-order  like  calculus  based  on  the  restricted  quantifiers  is  available  for 


’ I 


f 


Elements  of  a Problem  Specification  Language 


proving  properties  of  functions,  Note  that  this  calculus  is  constructive  in  the  sense  that  all 
expressions  denote  computable  functions  (cf.  [Co]). 

As  the  type  system  does  not  include  basic  set  types,  sets  have  to  be  generated  from  objects  that  are 
not  sets.  There  is  a canonical  way  of  deriving  set-valued  functions  from  types.  Recall  that  a type  t. 
is  a base  type  for  a data  type  i,  if  it  occurs  in  its  definition.  For  each  type  tj  that  is  a base  type  for 

" ..LOU,: 

IS  obtained  by  homomorphically  extending  the  mapping  base-type  ^ singleton-set.  More  precisely,  in 
the  homomorphic  extension  constructors  are  replaced  by  set  union  (with  appropriate  ar.ty);  those 
parts  of  a structure  that  do  not  involve  elements  of  type  t.  are  mapped  onto  the  empty  set.  An 

example  can  be  found  in  section  5. 


4 2 Schemes  for  Function  Definition 

In  section  3 we  introduced  a method  called  ’'homomorphic  extension  for  defining  new  functions 
over  a data  type.  A particularly  simple  special  case  of  this  method  is  the  endomorphic  extension  of  a 
function  An  endomorphism  is  a homomorphism  from  an  algebra  into  itself.  Since  all  the  algebraic 
operations  remain  unchanged,  the  only  parameters  of  endomorphic  extension  are  the  functions  on 
the  base  types  to  be  extended.  A typical  example  is  substitution  of  terms  for  variables.  Recall  the 
data  type  definition  for  binary  trees  over  atoms  from  section  3; 

bintr**  :*  mkbUsubutom)  | ccmp(<ir:bintr«*,  stcibintr**) 
where  atom  is  the  generating  base  type.  The  corresponding  endomorphic  extension  functional  is 

BT.nd  :*  [Xt.  [uE.  [Xx.  is.mkbt(x)  a f(8tomUvb(x))), 

is_comp(x)  comp(E((ir(x)),  E(s«c(x))), 


Now  if  we  want  to  solve  the  problem 

"Find  a function  varsubst:  bintreo  bintre#  such  that  varsubst  replaces  all  atoms  in  a binary 
tree  by  their  values  under  the  function  virsub;  atom  -»  bintr**," 


then  a solution  is  simply 


varsubst  * !3T*nd(varsub), 


and  this  solution  is  even  unique,  as  it  was  shown  in  section  3. 

So  far  we  have  been  looking  at  homomorphisms  only.  Unfortunately,  many  interesting  functions 
can  not  be  represented  as  homomorphisms.  But  we  can  apply  a similar  definition  technique  to  a 
larger  class  of  functions  simply  by  explicitly  stating  the  non-homomorphic  part  of  the  function  and 
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using  the  extension  functional  for  the  homomorphic  rest.  This  situation  occurs  often  with  data  types 
which  include  several  composed  subtypes,  an  example  can  be  found  in  the  next  section. 

The  functionals  derived  from  a data  type  definition  (for  homomorphic,  endomorphic  extension  etc.) 
not  only  permit  definition  of  new  functions  in  a concise  way,  they  also  facilitate  proving  properties. 
In  fact,  certain  properties  of  those  functions  derive  from  properties  of  the  functionals,  like  the 
induction  proof  rule  already  mentioned  above. 

Lemma  4.5: 

If  the  argument  functions  of  an  extension  functional  are  stricti total  then  the  resulting  function  is 
strict ‘total. 

Note  that  totality  entails  that  any  program  derived  from  a function  by  "meaning-preserving" 
transformations  terminates  on  defined  inputs. 

There  are  other  definition  schemes  that  hitherto  have  defied  a natural  algebraic  interpretation. 
Consider,  for  example,  the  following  form  of  function  iteration.  Let  the  expression 

[Vx<S  : f(x,z)] 

be  interpreted  as  "For  each  x in  S apply  [Xy.  f(x,y)]  to  z.”  This  can  be  made  more  precise  by  a 
recursion  on  the  sequence  representing  S: 

[XS  z.  (Vx<S  ! f(x,z)]]  « [*<F.  [XS  T is.*mpty**q(S)  » z,  F(tl(S),  f(hd(S),  z))]] 

However,  this  interpretation  causes  some  problems.  In  order  to  be  a conservative  extension  of  ti.e 
specification  language  as  defined  so  far  the  given  interpretation  has  to  be  consistent  with  the  notions 
introduced  previously.  In  particular,  if  two  sets  S and  S’  are  equal  one  would  expect 

Vx  i Ss  <(x,z)  « Vx  ( S’!  f(x,z) 

This  implies  that  the  applications  of  the  f(x,_)  must  be  independent  of  the  particular  representation 
of  S,  i.e.  the  "hidden  order"  on  S ; or.  at  least,  it  must  be  guaranteed  that  the  sequence  of 
applications  of  f can  be  executed  in  any  order.  This  virtually  restricts  applicability  of  the 
construction;  in  many  cases  it  may  net  be  easy  or  even  possible  to  verify  this  kind  of  commutativity. 
Although  operators  like  function  iteration  are  necessary  to  make  the  specification  language  powerful 
enough,  they  will  not  be  discussed  further  in  this  paper. 

4.3  Transformation  of  Function  Definitions  into  Programs 

So  far  we  have  been  discussing  methods  for  defining  functions  over  structured  data  and  their 
interpretation  in  LCF.  Now,  every  LCF  term  also  has  an  interpretation  as  a computation  rule  for 
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the  function  denoted  by  it.  Given  such  an  interpreter  for  LCF  this  allows  to  compute  all  the 
functions  definable  in  the  language.  However,  the  resulting  computations  would  be  quite  inefficient, 
in  particular  because  of  nestings  of  unnecessary  recursions  resulting  from  direct  interpretation  of  the 
constructs.  Consider,  for  example,  the  expression 

F(y)  • Vx<S(y).  P(x) 

where  the  type  of  y is  the  data  type  list  as  defined  above  and  S the  standard  set-valued  function 
s«t_of_atom.  Since  list  is  a recursive  type,  one  recursion  is  required  to  compute  S(y)  and  another  one 
to  compute  the  quantified  expression;  but  wc  can  do  much  better  by  utilizing  the  underlying 
algebraic  structure.  Note  that  the  value  of  F is  determined  by  the  values  of  P(x);  moreover,  we  have 

P t F o mkbt 

which  means  that  F is  a homomorphic  extension  of  P.  Because  of  the  uniqueness  property  it  follows 
that 

F ■ BThomfP,  a) 

where  BThom  is  the  homomorphic  extension  functional  for  bintr*#.  This  means  that  F can  be 
replaced  by  an  equivalent  function  that  involves  only  one  recursion.  Apart  from  that,  the  explicit 
representation  of  the  set  S(y)  is  eliminated. 

This  is  an  example  of  how  the  algebraic  concepts  can  be  used  to  simplify  function  definitions 
considerably.  It  shows  that  the  interpretation  of  the  specification  language  is  not  a case  of  simple 
macro  expansion,  but  a possibly  non-deterministic  process  of  simplifying  expressions  in  a suitable 
way,  which  is  similar  to,  eg.,  theorem  proving.  More  heuristic  methods  for  recursion  removal  have 
been  studied  by  R.  Burstall  and  J.  Darlington  [H.D]. 

The  regular  expression  structure  that  results  from  defining  functions  by  means  of  definition  schemes 
is  of  advantage  at  all  levels  of  program  development.  Apart  from  the  techniques  for  proving- 
properties  about  them  (see  above)  it  permits  uniform  application  of  optimizing  transformations,  like 
replacing  recursion  by  iteration.  Even  at  the  implementation  level  it  can  be  advantageous;  For 
example,  functions  defined  by  endomorphic  extension  can  be  implemented  in  such  a way  that  no 
additional  storage  (fcr  data)  is  required  (cf.  selective  updating  in  [Ho]).  If  it  has  been  proved  that 
the  transformation  and  implementation  techniques  preserve  meanings,  then  the  "correctness"  of 
resulting  programs  can  be  guaranteed.  Meaning  preserving  transformations  will  be  studied  in 
greater  detail  in  a subsequent  paper. 
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5.  An  Example:  Substitution  with  c^-Conversion 

5.1  The  Data  Types 

In  the  example  now  to  be  discussed  we  have  four  data  types,  defined  by 

term  :■  mKcterm(constof:const)  [ 
mKv1«rm(varof:var)  1 
mKaprly(funeof:lerm,  argeof:term)  1 
mKlambda(bvarof:bvar,  termof:torm)  | 
mKmu(bvarof:bvar,  lermof:term)  | 
mKcond(eondof:lerm,  iruaeofdarm,  faltaaofdarm) 

bvar  :■  mkbvar(varof:var) 

const  :■  uu  I tt  I ff  I oo 

var  is  taken  as  basic  and  not  further  specified. 

The  reader  will  notice  that  these  data  types  represent  the  abstract  syntax  of  LCF  terms.  In  algeb"aic 
terms  the  types  form  a heterogeneous  algebra  with  the  four  phyla  tarm,  bvar,  var,  and  const  and 
operations 


(opl) 

mkcterms 

const  -*  term 

(op2) 

mkvterm: 

var  -*  term 

(op3) 

mkapply: 

term* term  -*  term 

(op4) 

mklambda: 

bvar*  term  -♦  term 

(op5) 

mkmus 

bvarxterm  -♦  term 

(op6) 

mkcond! 

term*term*term  -»  term 

(op7) 

mkbvar: 

var  -*  bvar 

The  generating  phyla  (data  types)  are  const  and  var.  Obviously  the  phyla  var  and  bvar  are 
isomorphic;  the  reason  for  i;itroducing  the  data  type  bvar  is  that  it  is  more  convenient  to  separate  the 
binding  occurrences  of  variables  from  the  other  ones. 

From  the  data  type  definitions  the  following  characterizing  functions  are  generated: 

term  [ocF.  [Xx.  is_const(x)  = mkcterm(const(constof(x!„ 
is_mkvterm(x)  = mkvterm(var(varof(x)), 
is_mkapply(x)  a mkapply(F(funeof(x),  F(argeof(x))), 
is_mklambda(x)  s mklambda(bvar(bvarof(x)),  F(termof(x))), 
is_mkmu(x)  = mkmu{bvar(bvarof(x)),  F(termof(x))), 
is.mkcond(x)  =>  mkcond(F(condof(x)),  F(trueof(x)),  F(falseof(x))), 

■L  ]] 

bvar  :s  [Xx.  mkbvar(var(varof(x)))] 

In  order  to  define  a homomorphism  we  have  to  supply  7 operations  of  appropriate  types.  6 of  them 
correspond  to  the  constructors  occurring  in  the  characteristic  function  term;  the  last  one  is  to  replace 
mkbvar.  By  substituting  the  characteristic  funcJon  for  bvar  in  term  we  obtain  an  expression  that 
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includes  all  operations  and  completely  defines  homomorphic  extension.  For  endomorphic  extension 
only  the  operations  on  the  generating  subtypes  (i.e.  opl,  op2  and  op7)  are  required.  Let 

termhom  i*  [X  opl  op2  ..  op7  [e^F.  ]] 

and 

tormend  :»  [X  opl  op2  op7.  [o^F.  . ]] 
be  the  functionals  for  homomorphic  and  endomorphic  extension, 


5.2  Tlie  Problem 

We  want  to  formalize  the  following  (cf.[AW]): 

Replace  any  free  occurence  of  the  variable  v in  the  expression  (term)  • by  the  term  \ after 
renaming  bound  variables  in  e suitably  (i.e.  so  that  no  fret  variable  in  t will  become  bound  in  ») 

(a  common  notation  is  •[t/v]). 

What  is  described  above  is  the  basic  conversion  rule  of  the  X-calculus  as  it  is  incorporated  in  the 
LCF  system.  It  may  be  desirable  to  have  a system  that  is  smart  enough  to  understand  this 
description  of  substitution  and  to  translate  it  from  English  into  ' programming  language.  At 
present,  such  a system  is  not  available.  It  would  require  knowledge  about  what  exactly  is  meant  by 
"free  occurrence",  "replace",  "renaming"  etc.  For  the  time  being  we  have  to  be  satisfied  with  specifying 
those  notions  in  some  kind  of  formal  language  and  having  a less  ambituous  system  transform  the 
specification  statements  into  executable  code.  In  any  case,  we  need  a formal  definition  in  order  to  be 
able  to  prove  anything  about  the  funct.on. 

We  construct  a function  subst:  varxUrmxUrm  -+  t»rm  by  stepwise  specifying  the  informal  notion  in 
our  language.  Let  subst  be  defined  by 

subst  :■  [Xv  t •.  substfre*(v,  t,  r«nam*bvar(«,t))] 

where 

substfr**(v,t,*)  :*  "replace  all  free  occurences  of  v in  • by  t" 

r«nam*bvar(*,t)  :■  "rename  bound  variables  in  • that  occur  free  in  t appropriately" 

a)  bound  variables  in  term.  The  function  boundvarsin:  term  ->  setof(var)  returns  a set  of  variables  for 
which  there  is  a binding  occurrence  in  the  term.  This  is  just  the  standard  set  function  set-of-bvar 
composed  with  the  isomorphism  variso  from  bvar  to  var,  extended  to  sets.  Here  we  can  see  how  the 
separation  of  the  type  bvar  from  var  facilitates  definition  of  set-valued  functions.  s«t-of-bvar  is  the 
homomorphism  defined  by  the  operations 

b,  :a  b2  :s  [^X' {)]  (empt)  set) 

bg  :s  bfl  ts  b5  :»  U 

be  U3 

b;  :»  [Xx.  {x}]  (singleton  map) 
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i.e.,  6«t-of-bvir  ■ t«rmhom(bi,  . . .b;), 
Then 

boundvariin  :■  viriso  o stl-of-bvar 


b)  fret  variables  in  term.  The  standard  function  a«t-of-vars  returns  all  occuring  variables  regardless 
of  whether  they  are  free  or  not.  So  we  have  to  update  that  function  appropriately  to  get  a function 
that  returns  only  free  variables.  If  we  had  separated  the  X-  and  a-terms  from  the  type  term  we  could 
use  a standard  set-of-dtype  function  for  defining  Irtavarain.  Instead,  we  define  it  directly  as  as  a 
homomorphism 


fraavarsin:  larm  -*  lalof(var). 


Using  the  set-valued  functions 

f,  [Xx.  {}] 

:i  f;  :•  [Xx.  {x}] 
f3  :i  U 

in  :»  ie,  t*  [Xx  y.  y\x]  (set  difference) 

U3 


the  function  is  definable  by 

fraavartin  :i  tarmhom(f|, . . ,f;) 

c)  Renaming  bound  variables.  We  need  a function  nawvar  that  "invents"  new  variables  (which  do  not 
occur  in  either  a or  t).  Strictly  speaking  the  existence  of  nawvar  depends  on  a function  that 
enumerates  all  variables  and  returns  the  first  element  with  a certain  property.  In  any  practical 
implementation  we  "know"  all  the  variable  names  available  to  the  user,  so  a function  that  generates 
new  names  is  available.  In  the  abstract  context  it  is  sufficient  to  assume  the  existence  of  a strict  and 
total  function  newvir  that  returns  a variable  with  the  property 

' n«wvar(v,«,t)  < varsin(«)  U varsin(t)  U {v}. 

Using  this  function  we  can  specify  renaming  of  bound  variables; 


r«nim*vir(i,1)  :•  "rename  in  • eacA  variable  that  occurs  free  in  1 and  bound  in  #" 

formally: 

renamavir  :■  [Xa  t.  [Vx  i fraavarsin(t)  n bvartin(a)  : ranama(t,x,a)]] 
ranama  n [Xt.  [Xx  a.  tarmand(inKctarm,  raplacavar,  mkbvaroraplacavar)]] 

where  raplacavar  denotes  the  term  [Xz.  z»x  a nawvar(x,a,t),  z].  Note  that  the  use  of  the  iteration 
construction  is  justified  by  the  fact  that  renaming  of  bound  variables  can  be  done  in  any  order;  all 
resulting  terms  are  equivalent. 
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d)  substfree.  "Free  occurrence"  means  "not  bound",  i.e.  "not  in  the  range  of  a X or  oc  binding  that 

variable."  So,  in  order  to  find  occurrences  of  a variable  v we  have  to  search  (recur  in)  the  tree 

representing  the  term  •.  Whenever  we  come  across  a X or  oc  (that  is,  a mklambda  or  mkmu)  that  binds 
V,  we  stop  and  return.  Then  any  remaining  occurrence  of  v is  a free  one  and  is  to  be  replaced  by  1. 
In  the  formal  language  this  is  expressed  by  a construction  using  a modified  functional  for 
endomorphic  extension; 

cubstfraa  >■  [ecS.  [Xv  1.  [Xa.  is_mkvtarm(a)  a varof(a)*v  s t,a, 

(isjambda(a}vit_mu(a))  a bvarof(a)>v  s •, 
urmO(S(v,t),a)]]], 

Here  termO  the  operator  on  F that  defines  tarm,  i.e,  tarrn  ■ [ocF,  [Xx.  tarmO(F,x)]]  . 

This  finishes  the  formal  specification  of  the  substitution  function.  The  collection  of  all  the  function 

definitions 

subst  :<  [Xv  1 a.  substfraa(v,t,ranamavar(a,t))] 

substf raa  :■  [ocS.  [Xv  1.  [Xa.  is_mkvtarm(a)  » varof(a)«v  s t,a, 

(is_lambda(a)vis_mu(a))  a bvarof(a)>v  » a, 

»armO(S(v,»),a))]) 

ranamevar  :i  [Xa  t.  [Vx  ( fraavarsin(t)  n bv^fsin(a)  : ranama(t,x,a)]) 

rename  :<  [XI.  [Xx  a.  tarmend(mkcterm,  raplacavar,  mkbvaroraplacevar)]] 

replacavar  :e  [^.  z«x  a nawvar(x,a,t),z] 

bvarsin  :■  va'iso  « sat*of*bvar 

sat-of-bvar  :■  tarmhom([Xx.{)],  [Xx,{}],  U,  U,  U,  U3,  [Xx.(x})) 

fraavarsin  :•  tarmhom([Xx.{}),  [Xx.(x}],  U,  \,  \,  U3,  [Xx.(x})) 

is  somewhat  longer  thtin  the  informal  description  in  English,  yet  it  is  complete  in  the  sense  that  a 
sufficiently  smart  system  can  transform  it  into  a reasonably  efficient  program,  using  transformations 
of  the  sort  indicated  in  the  preceding  section. 
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6.  Concluding  Remarks 

In  this  paper,  the  representation  of  abstract  data  types  in  LCF  and  the  algebraic  interpretation  of 
structures  were  discussed.  This  led  to  constructions  that  permit  to  specify  functions  operating  on  data 
structures  in  a concise  way  and  close  to  what  may  be  considered  "natural."  The  methods  were 
demonstrated  in  an  example  taken  from  the  actual  LCF  system. 

The  construction  methods  considered  here  constitute  only  a first  step  towards  an  elaborated  language 
that  will  allow  easy  and  concise  definition  of  complex  functions  as  they  are  needed  in,  e.g.,  structure 
manipulating  systems.  There  are  many  directions  in  which  the  work  presented  here  has  to  be 
extended.  Some  have  already  been  mentioned  in  the  preceding  sections;  systematic  extension  of  the 
system  of  data  type;  more  general  function  definition  schemes;  general  methods  for  transforming  and 
optimizing  function  definitions,  in  particular  for  removal  of  redundant  recursions,  the  translation  of 
logical  expressions  into  a "real-life"  programming  language.  In  the  paper,  only  methods  for  explicit 
function  definition  have  been  discussed.  However,  it  appears  that  techniques  for  solving  equations 
that  define  functions  implicitly  can  similarly  be  derived  from  the  explicit  representation  of  the  data 
structure  by  a retract.  The  retract  could  serve  for  guiding  the  search  for  solutions  and  for 
structuring  the  resulting  program.  The  development  of  such  problem  solving  methods  in  the 
framework  of  LCF  has  to  be  left  to  future  studies 

How  much  of  the  methods  discussed  here  can  be  automated?  It  is  obvious  that  the  generation  of  the 
appropriate  set  of  axioms,  of  function  definition  schemes  and  rules  for  structural  induction  from  the 
data  type  definitions  is  straightforward  and  can  be  completely  automated.  Furthermore,  many  checks 
for  simplifications  and  transformations  can  be  done  on  a purely  syntactic  level  accessible  to 
automation.  So  it  should  be  easy  to  incorporated  all  these  features  and  special  knowledge  about  the 
restricted  set  calculus  into  an  interactive  system  foi  developing  programs  and  proving  theorems 
about  them. 
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A.  Appendices 

A.l  Logic  for  Computable  Functions 


A. 11  Syntax  of  Typed  LCF 

The  following  is  an  extract  taken  from  [Mil]. 


Typ«  At  bottom  tr  and  ind  ara  types,  F.,the.  .f  /!!  and  42  »re  types  then  (4^42)  l«  e type 
With  eash  term  of  the  lojis  there  is  an  unambHeOusly  associated  type.  For  a term  I we  write  t;4  to 
mean  that  the  type  associated  with  t is 


Terms  (metavariables  s.t.sl.tl,...)  The  following  are  terms: 


Identifiers  (metavariables  x.y)  - sequences  of  upper  or  lower  letters  and  digits.  We  assume  that 
the  type  of  each  identifier  is  uniquely  determined  in  some  manner. 


Applications  - s(t)  : /12  , where  and  t:/?*- 

Conditionals  - (8-*tl,t2)  : , where  s:tr  and  tl,t2:/l. 

\..expressions  - [Xx.s]  : , where  x:/!l  and  es/J2. 

oc-expressions  * (ocx.s]  s fi  i where  x,$*^. 


The  intended  interpretation  of  the  oc-expression  is  t.ir  minimal  fixed-point  of  the  function  or 

functional  denoted  by  [Xf.$].  For  example: 


[ocf.[Xx.(p(x)-*f(a(x)),b(x))]] 

denotes  the  function  defined  recursively  as  follows: 
f(x)  <-  if  p(x)  then  f(a(x))  else  b(x). 


Constants  The  identifiers  tt,  ff  denote  truthvalues  true  and  false.  X denotes  the  totally  undefined 
object  of  any  type:  in  particular,  the  undefined  truthvalue. 


Atomic  well'formed  formulae  (awffs)  The  following  is  an  awff: 

s c t 


where  s and  t are  Of  the  same  type.  The  intended  interpretation  of  $ct  is,  roughly,  that  t is  at  least  as 
well  defined  as,  and  consistent  with,  *. 


Well.formed  formulae  (wffs)  (metavariables  P,Q,P1,Q!.-.)  Wffs  are  sets  of  zero  or  more  awffs, 
written  as  lists  with  separating  commas.  They  are  interpreted  as  conjunctions.  We  use 
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fi  • t 


to  abbreviate  set,  tcs. 


Sentences  Sentences  are  implications  between  wffs,  written 

P 1-  Q 

or,  if  P is  empty,  just  \-  Q. 

Proofs  A proof  is  a sequence  of  sentences,  each  being  derived  from  zero  or  more  preceding 
sentences  by  a rule  of  inference. 

The  strict  syntax  for  terms  and  awffs  is  relaxed  in  the  machine  implementation  to  allow  a seving  of 

parentheses  and  brackets.  In  addition,  we  use  the  abbreviation 
f(x,y)  <or  f(x)(y) 

V X.  1 ■ i fO'’  ^x.l  ■ Xx.i 

p ::  q I r for  p 3 ■ P ^ 

Functions  are  used  in  infix  notation  where  it  is  obvious  what  is  meant. 


A.l«2  Type  free  LCF 

The  tvoe  free  version  of  LCF  differs  from  the  typed  one  essentially  in  the  handling  Of  truth  values  and 
conditional  expressions.  Apart  from  that  it  also  specifies  the  structure  of  the  domain.  Besides  the  truth 
values  there  are  constants  T for  the  trutn  values  retract  and  I for  the  universal  domain.  In  the 
following  the  additional  axioms  and  rules  of  inference  are  listed. 


MAX  I-  s c T 

COND  I-  T 8,8  ■ 8 

|-  r -*  8,t  ■ T(r)  -*  8,t 


I-  T(s)(t)  ■ r(8) 

1-  T c T -»  T 

1-  I i [ocJ.  T u (J  -*  J)] 


The  CASES-rule  is  changed  to 


CASES 


p I-  Q{x/x)  p 1-  Q{tt/x)  p I- 

P t-  Q{T/x) 


J 
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A.2  Special  Functions  in  tfLCF 


A.2.1  Definedness  predicate 
We  want  a predicate  d such  that 


d(x)«tt,  x*X,  XiT 


Define  d by 

d 1 upt  U down 

where 

down  • [X  X.  X,x] 

upt  .[ocP.  [Xx.  (x=.tt,tt)UP(x(T))]]  (.  [Xx.uptf(x)atl,tt]) 


down  maps  everything  to  x except  t which  goes  to  t : 

down(x)sT  1“  T(x)sT  1*  XiT 


maps  everylhmg  to  a ««pt  x which  is  mapped  to  x.  The  desired  properties  of  » are  then 
obvious. 


A.2.4  istrue 

Our  aim  is  to  give  a function  that 

■ maps  everything  on  a truth  value  and 


- eives  the  values  tt  and  H exactly  for  the  arguments  tl  and  <f  resp. 

This  function  will  enable  us  to  test  effectively  variables  for  "well  defined"  truth  values.  In  the  type- 


free  logic  the  simple  conditional  does  not  provide  this  function  as  it  is  defined  "relative  to  the  truth 
values  reiract  T ".  However,  we  can  define  istru.  using  a limit  construction.  That  such  a definition  is 
possible  at  all  is  due  to  the  fact  that  the  truth  values  are  isolated  points  in  the  lattice  1. 


Definition:  istru*  * [oc  S.  [X  x.  xia  tt  U S(x  t),  ff  U S(x  t)]] 


It  is  easy  to  sfiow  by  cases  that 


(1 ) T c istru* 

Since  V x.  T(x)  c x we  also  have 

(2)  T o istru*  c istru* 

From  the  definition  follows  immediately 


(3a)  istru«(x)itt  \-  istru*lx(T))  c istru*(x) 

(3b)  istru*(x)iff  1-  istru*(x(T))  c istru*(x) 


a 


1 


Next  we  show  by  induction  on  istrue 

I 

(5)  Vx.  istrui(x)  c T(i$tru«(x)) 
j 1,  X(x)  c ...  ok. 

II  Assum*  Vx.  S(x)  c T(istru«(x)).  W«  hav*  to  show  * 

X ttUS(x  T),  tf  U S(x  T)  C T(istru«(x)).  j 

By  cases  T(x):  • 

T(x)*X  i trivial 

T(x)s T ! implies  istru«(x)«T,  trivial.  i J 

"’"liirr  ti  U S(x  T)  c tt  U T(istrue(x  T))  by  Ind.Hyp, 

S 1 T(tt  U istrue(x  T))  by  L54  Mi-We 

I T(istrue(x)). 

T(x)sff  : analog  ^ 

' With  (2)  we  have  shown  | 

(6)  istrue  a T o istrue.  i 

I which  means  that  the  range  of  istrue  is  a set  of  truth-values.  \ 

On  the  other  hand  we  already  mentioned  that 
(6a)  istrue(tv)  * tv 

holds  for  each  truth  value  tv.  Thus,  in  a short  notation 

(7)  istrue  o T ■ T 


1 e.  istrue  is  an  identity  on  T.  From  (6)  and  (7)  we  deduce  the  retract  property  for  istrue: 

istrue  o istrue  a istrue  o (T  o istrue) 
a (istrue  0 T)  0 istrue 
(g)  a T o istrue 

■ istrue 
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The  ultimate  goal  Is  to  show 

(x)  istru«(x)  s tt  j-  X » tt 
istru«(x)  • H j-  X ■ <f 

i.e,  istru.  is  a truth-valued  function  that  gives  the  values  tt  or  <f  exactly  for  tt  and  <f  resp.  In  order 
to  do  so  we  introduce  another  truth-valued  function: 

Definition:  uptf  * [o<  P.  T(x)  U P(x(T))]] 

By  definition  we  have 

(Ul)  T c uplf 

We  prove  the  following  facts  about  uptf: 

(U2)  X c uptf 

(9)  istru*  e uptf 

(10)  istru«(x)  • tt  1-  uptf(x)  c iitru«(x) 

(9)  and  (10)  together  show 

(11)  i8tru«(x)»tt  1-  uptf(x)»a 

With  (U2)  it  follows  that 

(12)  xctt 

on  the  other  hand,  since  tt  ■ T(x)  c x we  have 

(13)  istru«(x)  ■ tt  1-  X ■ tt. 

The  proof  for  the  corresponding  statement  for  ff  follows  the  same  line. 

A.3  Structural  induction 

The  basic  idea  of  how  to  do  structural  induction  in  LCF  is  that  it  can  actually  be  simulated  if  a 
recursive  function  "describing"  the  structure  is  available.  For  the  kind  of  structures  we  are  interested 
in  in  this  paper  the  retraction  constructed  from  the  type  definition  serves  this  purpose.  So,  structural 
induction  becomes  a mere  application  of  computational  induction.  The  derivation  of  the  induction 
rule  as  in  theorem  3.x  is  done  in  two  steps;  1)  first  derive  a rule  involving  the  retraction:  2)  modify 
the  rule  in  1)  by  using  the  type-predicate.  Since  proving  the  rule  in  full  generality  would  be  rather 
tedious,  it  is  demonstrate  by  means  of  the  example  seq. 


'•  I : 


; » 


li  i ? 
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Recall  that  saq  is  defined  by  the  retraction 


saq  « [«iS.  [Xx.  is_amptysaq(x)  a amptyeaq, 

is  mksaq(x)  a mksaq(atom(hd(x)),  S(tl(x))), 

J.“]] 


First,  we  prove  the  rule 

(a,)  P(X)  (aj)  P(T)  (ag)  P(amp»ytaq)  (a^)  Vx.P(y)  Vx.P(mksaq(x,y)) 

P(saq(x)) 


(Rl) 


where  x and  y do  not  occur  in  P.  By  computational  induction,  we  can  deduce 
Vx.  P(saq(x)) 


from 


Vx.  P(X  x)  and  Vx.  P(S  x)  |-  Vx.  P(tau(S)(x)). 


Since 


P(x  x)  <•>  P(x) 
the  base  case  is  proved  by  premise  (bj). 


Now  assume 
In  order  to  prove 
we  expand  tau(S)  to 


Vx.  PIS  x) 
Vx.  P(tau(S){x)) 


tau(S)(x)  1 is_amptysaq(x)  a amptysaq, 

is.mksaq(x)  a mksaq(a1om(hdlx))i  SUKx))), 
X 


and  split  into  cases  which  then  can  be  deduced  from  appropriate  premises: 

is.axptysaq(x)  • X : tau(S)(x)  - X by  premise  (a,) 

1 tl  : ■ amptysaq  by  la3> 

it:  ■ T by  (aj) 

■ ff  : case  split  for  is.mksaq(x) : only  tt  is  interesting: 

■ mk8aq(atomly),S(x))  by  (aq) 


Now,  the  rule 


(b,)  P(amptysaq)  Ibj)  P(y)  |-  Plmksaq(x,y)) 
Vx.is_saq(x)  ■>  P(x) 


follows  from  (R 1)  by  virtue  of  the  facts  that  the  relativiiing  type  predicate  eliminates  the 
and  T. 
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